Preprocessing: A Prerequisite for Discovering Patterns in WUM Process

نویسندگان

  • C. Ramya
  • K. S. Shreedhara
  • G. Kavitha
چکیده

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering redundant and irrelevant data, removing noise, transforming and resolving any inconsistencies. In this paper, a complete preprocessing methodology having merging, data cleaning, user/session identification and data formatting and summarization activities to improve the quality of data by reducing the quantity of data has been proposed. To validate the efficiency of the proposed preprocessing methodology, several experiments are conducted and the results show that the proposed methodology reduces the size of Web access log files down to 73-82% of the initial size and offers richer logs that are structured for further stages of Web Usage Mining (WUM). So preprocessing of raw data in this WUM process is the central theme of this paper. Keywords-Data Preprocessing, Web log data, Web usage mining, User/Session identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preprocessing: A Prerequisite for Discovering Patterns in Web Usage Mining Process

Web log data is usually diverse and voluminous. This data must be assembled into a consistent, integrated and comprehensive view, in order to be used for pattern discovery. Without properly cleaning, transforming and structuring the data prior to the analysis, one cannot expect to find meaningful patterns. As in most data mining applications, data preprocessing involves removing and filtering r...

متن کامل

ARPN Journal of Science and Technology::An Effective Web Usage Analysis using Fuzzy Clustering

Nowadays, internet is a useful source of information in everyone’s daily activity. Hence, this made a huge development of World Wide Web in its quantity of interchange and its size and difficulty of websites. Web Usage Mining (WUM) is one of the main applications of data mining, artificial intelligence and so on to the web data and forecast the user’s visiting behaviors and obtains their intere...

متن کامل

Exploiting Knowledge Representation for Pattern Interpretation

Web Usage Mining (WUM) is the application of data mining techniques over web server logs in order to extract navigation usage patterns. Semantic Web Usage Mining aims at combining the Semantic Web and WUM. The main goal of the Semantic WUM is to improve the process and the results of WUM by exploiting the new semantic structure in the Web. Pattern analysis is a critical phase in WUM, for two ma...

متن کامل

A Novel Approach for User Navigation Pattern Discovery and Analysis for Web Usage Mining

Email: [email protected] Abstract: Websites on the internet are useful source of information in our day-to-day activity. Web Usage Mining (WUM) is one of the major applications of data mining, artificial intelligence and so on to the web data to predict the user’s visiting behaviours and obtains their interests by analyzing the patterns.WUM has turned out to be one of the considerab...

متن کامل

An Efficient Preprocessing Methodology for Discovering Patterns and Clustering of Web Users using a Dynamic ART1 Neural Network

Abstract : In this paper, a complete preprocessing methodology for discovering patterns in web usage mining process to improve the quality of data by reducing the quantity of data has been proposed. A dynamic ART1 neural network clustering algorithm to group users according to their Web access patterns with its neat architecture is also proposed. Several experiments are conducted and the result...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1104.2284  شماره 

صفحات  -

تاریخ انتشار 2011